Conversation
|
Thanks @ValbuenaVC for picking this up! One improvement I had in mind was to create more strategies by running the different groups of jailbreaks we have in PyRIT. Right now I have only the one at the root of the directory, but we added quite a few more recently, and it would make sense to have one strategy per folder (and ALL to run them all). |
| # Will be resolved in _get_atomic_attacks_async | ||
| self._seed_groups: Optional[List[SeedAttackGroup]] = None | ||
|
|
||
| def _get_default_objective_scorer(self) -> TrueFalseScorer: |
There was a problem hiding this comment.
Not for this PR, but wondering if we should just make _get_default_objective_scorer a non-abstract base class
There was a problem hiding this comment.
@rlundeen2 are you suggesting we should move this function to the base class and let subclasses override it?
|
|
||
| return list(seed_groups) | ||
|
|
||
| def _get_all_jailbreak_templates(self) -> List[str]: |
There was a problem hiding this comment.
I recommend using/extending the TextJailBreak class instead of looking for the yaml directly.
There was a problem hiding this comment.
I also wonder if the number of jailbreaks could have some further filtering from the scenario strategy, so it's not necessarily always "all". It could be random N, or it could be a subcategory, or maybe other.
This is probably important so we can have shorter or more targeted runs.
There was a problem hiding this comment.
I was toying with the idea when writing the draft of this, and I believe the most reasonable option is random N in this version. Separately, I would like us to review all the jailbreak templates and reorganize/recategorize them (not just for scenario strategy breakdown but general usability) but I feel this is more of a v2 thing.
There was a problem hiding this comment.
Added random jailbreak selection in latest commit
| ) | ||
|
|
||
| # Create the attack | ||
| attack = PromptSendingAttack( |
There was a problem hiding this comment.
(not required) Wonder if we should send multiple times as an option
| # Will be resolved in _get_atomic_attacks_async | ||
| self._seed_groups: Optional[List[SeedAttackGroup]] = None | ||
|
|
||
| def _get_default_objective_scorer(self) -> TrueFalseScorer: |
There was a problem hiding this comment.
@rlundeen2 are you suggesting we should move this function to the base class and let subclasses override it?
Description
Addition of a jailbreak scenario to PyRIT, which applies jailbreak templates to a set of test prompts and sends them to the target. Credit to @fdubut for developing the scenario. Also made a minor change to
pyrit.datasets.jailbreak.text_jailbreak.TextJailBreakto add a class method allowing for discovery of all jailbreak template files.Tests and Documentation
Adding
test_jailbreak.pyunder the unit tests.